netdev CI testing #6666

kuba-moo · 2024-03-27T20:02:33Z

Reusable PR for hooking netdev CI to BPF testing.

A number of dwmac variants from Rockchip SoCs have turned up in the Rockchip-specific binding, but not in the main list in snps,dwmac.yaml which as the comment indicates is needed for accurate matching. So add the missing rk3528, rk3568 and rv1126 to the main list. Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Conor Dooley <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Rockchip RK3506 has two Ethernet controllers based on Synopsys DWC Ethernet QoS IP. Add compatible string for the RK3506 variant. Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Conor Dooley <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Add the needed glue blocks for the RK3506-specific setup. The RK3506 dwmac only supports up to 100MBit with a RMII PHY, but no RGMII. Signed-off-by: David Wu <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The dwmac-rk glue driver is currently not caught by the general maintainer entry for Rockchip SoCs, so add it explicitly, similar to the i2c driver. The binding document in net/rockchip-dwmac.yaml already gets caught by the wildcard match. Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>

When the server has MPTCP enabled but receives a non-MP-capable request from a client, it calls mptcp_fallback_tcp_ops(). Since non-MPTCP connections are allowed to use sockmap, which replaces sk->sk_prot, using sk->sk_prot to determine the IP version in mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning incorrect ops to sk->sk_socket->ops. Additionally, when BPF Sockmap modifies the protocol handlers, the original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely trigger warnings. Fix this by using the more stable sk_family to distinguish between IPv4 and IPv6 connections, ensuring correct fallback protocol operations are selected even when BPF Sockmap has modified the socket protocol handlers. Fixes: 0b4f33d ("mptcp: fix tcp fallback crash") Cc: <[email protected]> Signed-off-by: Jiayuan Chen <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Signed-off-by: NipaLocal <nipa@local>

MPTCP creates subflows for data transmission, and these sockets should not be added to sockmap because MPTCP sets specialized data_ready handlers that would be overridden by sockmap. Additionally, for the parent socket of MPTCP subflows (plain TCP socket), MPTCP sk requires specific protocol handling that conflicts with sockmap's operation(mptcp_prot). This patch adds proper checks to reject MPTCP subflows and their parent sockets from being added to sockmap, while preserving compatibility with reuseport functionality for listening MPTCP sockets. We cannot add this logic to sock_map_sk_state_allowed() because the sockops path doesn't execute this function, and the socket state coming from sockops might be in states like SYN_RECV. So moving sock_map_sk_state_allowed() to sock_{map,hash}_update_common() is not appropriate. Instead, we introduce a new function to handle MPTCP checks. Fixes: 0b4f33d ("mptcp: fix tcp fallback crash") Cc: <[email protected]> Signed-off-by: Jiayuan Chen <[email protected]> Suggested-by: Jakub Sitnicki <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Add test cases to verify that when MPTCP falls back to plain TCP sockets, they can properly work with sockmap. Additionally, add test cases to ensure that sockmap correctly rejects MPTCP sockets as expected. Signed-off-by: Jiayuan Chen <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Currently, in hclge_mii_ioctl(), the operation to read the PHY register (SIOCGMIIREG) always returns 0. This patch changes the return type of hclge_read_phy_reg(), returning an error code when the function fails. Fixes: 024712f ("net: hns3: add ioctl support for imp-controlled PHYs") Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Currently, when debugfs and reset are executed concurrently, some resources are released during the reset process, which may cause debugfs to read null pointers or other anomalies. Therefore, in this patch, interception protection has been added to debugfs operations that are sensitive to reset. Fixes: eced3d1 ("net: hns3: use seq_file for files in queue/ in debugfs") Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: NipaLocal <nipa@local>

In efx_mae_enumerate_mports(), memory allocated for mae_mport_desc is passed as a argument to efx_mae_process_mport(), but when the error path in efx_mae_process_mport() gets executed, the memory allocated for desc gets leaked. Fix that by freeing the memory allocation before returning error. Fixes: a6a15ac ("sfc: enumerate mports in ef100") Acked-by: Edward Cree <[email protected]> Signed-off-by: Abdun Nihaal <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The changes introduced in commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") have been found to cause a race condition in production environments. Under specific circumstances, observed exclusively on ARM64 (aarch64) systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become permanently stalled. This happens when the race condition leads to the TXQ entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up, preventing the attached qdisc from dequeueing packets and causing the network link to halt. As a first step towards resolving this issue, this patch introduces a failsafe mechanism. It enables the net device watchdog by setting a timeout value and implements the .ndo_tx_timeout callback. If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function, which logs a warning and calls netif_tx_wake_queue() to unstall the queue and allow traffic to resume. The log message will look like this: veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms veth42: veth backpressure stalled(n:1) TXQ(0) re-enable This provides a necessary recovery mechanism while the underlying race condition is investigated further. Subsequent patches will address the root cause and add more robust state handling in ndo_open/ndo_stop. Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The veth driver started manipulating TXQ states in commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops"). Other drivers manipulating TXQ states takes care of stopping and starting TXQs in NDOs. Thus, adding this to veth .ndo_open and .ndo_stop. Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") introduced a race condition that can lead to a permanently stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra Max). The race occurs in veth_xmit(). The producer observes a full ptr_ring and stops the queue (netif_tx_stop_queue()). The subsequent conditional logic, intended to re-wake the queue if the consumer had just emptied it (if (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and traffic halts. This failure is caused by an incorrect use of the __ptr_ring_empty() API from the producer side. As noted in kernel comments, this check is not guaranteed to be correct if a consumer is operating on another CPU. The empty test is based on ptr_ring->consumer_head, making it reliable only for the consumer. Using this check from the producer side is fundamentally racy. This patch fixes the race by adopting the more robust logic from an earlier version V4 of the patchset, which always flushed the peer: (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier are removed. Instead, after stopping the queue, we unconditionally call __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled, making it solely responsible for re-waking the TXQ. (2) On the consumer side, the logic for waking the peer TXQ is centralized. It is moved out of veth_xdp_rcv() (which processes a batch) and placed at the end of the veth_poll() function. This ensures netif_tx_wake_queue() is called once per complete NAPI poll cycle. (3) Finally, the NAPI completion check in veth_poll() is updated. If NAPI is about to complete (napi_complete_done), it now also checks if the peer TXQ is stopped. If the ring is empty but the peer TXQ is stopped, NAPI will reschedule itself. This prevents a new race where the producer stops the queue just as the consumer is finishing its poll, ensuring the wakeup is not missed. Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Alex will send phylink patches soon which will make us link up on QEMU again, but for now let's hack up the link. Gives us a chance to add another QEMU NIC test to "HW" runners in the CI. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Let's see if this increases stability of timing-related results.. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

These are unlikely to matter for CI testing and they slow things down. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: NipaLocal <nipa@local>

We exclusively use headless VMs today, don't waste time compiling sound and GPU drivers. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

kmemleak auto scan could be a source of latency for the tests. We run a full scan after the tests manually, we don't need the autoscan thread to be enabled. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

… HEAD

kuba-moo force-pushed the to-test branch from 6bd5e75 to bdd05e2 Compare March 27, 2024 21:49

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 4f22ee0 to 8a9a8e0 Compare March 28, 2024 04:46

kuba-moo force-pushed the to-test branch 11 times, most recently from 64c403f to 8da1f58 Compare March 29, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 78ebb17 to 9325308 Compare March 29, 2024 02:14

kuba-moo force-pushed the to-test branch 6 times, most recently from c8c7b2f to a71aae6 Compare March 29, 2024 18:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 9325308 to 7940ae1 Compare March 29, 2024 18:12

kuba-moo force-pushed the to-test branch 2 times, most recently from d8feb00 to b16a6b9 Compare March 30, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 7940ae1 to 8f1ff3c Compare March 30, 2024 00:21

kuba-moo force-pushed the to-test branch 2 times, most recently from 4164329 to c5cecb3 Compare March 30, 2024 06:00

mmind and others added 27 commits October 23, 2025 11:00

nipa: disable random kunit tests

8135238

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: disable 6.17's merge window kunit tests

5b78bfc

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: config: x86: use periodic HZ tick

b516b07

Let's see if this increases stability of timing-related results.. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: profile (time) test output

62dc58e

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: timestamp - try waking

d4cf2dd

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: dbg: tests: bonding: print info on failure

829d020

Signed-off-by: NipaLocal <nipa@local>

nipa: selftests: net: enable profiling

b182584

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: tc_action dbg

3021f14

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: config: disable CPU_MITIGATIONS

368779a

These are unlikely to matter for CI testing and they slow things down. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: forwarding: set timeout to 3 hours

a193e4e

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

nipa: drv: net: add timeout

d9a38d5

Signed-off-by: NipaLocal <nipa@local>

nipa: config: x86: disable GPUs and sound

55e43f1

We exclusively use headless VMs today, don't waste time compiling sound and GPU drivers. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: config: disable kmemleak auto scan

47f2641

kmemleak auto scan could be a source of latency for the tests. We run a full scan after the tests manually, we don't need the autoscan thread to be enabled. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

kuba-moo force-pushed the to-test branch from d3ca081 to 89e438c Compare October 23, 2025 18:01

Merge remote-tracking branch 'origin/net-next-2025-10-23--18-00' into…

0028825

… HEAD

kuba-moo force-pushed the to-test branch from 89e438c to 0028825 Compare October 23, 2025 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

netdev CI testing #6666

netdev CI testing #6666

kuba-moo commented Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

59 participants

netdev CI testing #6666

Are you sure you want to change the base?

netdev CI testing #6666

Conversation

kuba-moo commented Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

59 participants